Making maps in R: `maps` package

maps package contains a lot of outlines of continents, countries, states, and counties
ggplot2’s map_data() function puts these outlines in data frame format, which then allows us to plot them with ggplot()

Example: Drawing CA and its counties (1)

library(tidyverse)
library(maps)

county_data <- map_data("county")
CA_data <- county_data %>% filter(region == "california")
head(CA_data)

##        long      lat group order     region subregion
## 1 -121.4785 37.48290   157  6965 california   alameda
## 2 -121.5129 37.48290   157  6966 california   alameda
## 3 -121.8853 37.48290   157  6967 california   alameda
## 4 -121.8968 37.46571   157  6968 california   alameda
## 5 -121.9254 37.45998   157  6969 california   alameda
## 6 -121.9483 37.47717   157  6970 california   alameda

Example: Drawing CA and its counties (2)

County outlines are drawn using geom_polygon.

ggplot(data = CA_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, group = group))

Example: Drawing CA and its counties (3)

coord_quickmap() preserves the aspect ratio of the map.

ggplot(data = CA_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, group = group)) + 
    coord_quickmap()

Example: Drought levels in CA (1)

drought_data <- read_csv("Drought data.csv")
head(drought_data, n = 3)

## # A tibble: 3 x 2
##   County  Drought_percent
##   <chr>             <dbl>
## 1 alameda             100
## 2 alpine              100
## 3 amador              100

head(CA_data, n = 3)

##        long     lat group order     region subregion
## 1 -121.4785 37.4829   157  6965 california   alameda
## 2 -121.5129 37.4829   157  6966 california   alameda
## 3 -121.8853 37.4829   157  6967 california   alameda

Our drought data and mapping information are in different datasets!

Joining datasets

Sometimes our data are spread across different datasets, making it difficult to answer some questions.

Question: Who scored the highest in English in each class?

bio

##     Name Class
## 1 Andrew     A
## 2   John     B
## 3   Mary     A
## 4   Jane     B

scores

##     Name Subject Score
## 1   John English    76
## 2 Andrew English    66
## 3   John    Math    85
## 4   Mary English    71

Joining datasets (using `dplyr`)

library(dplyr)
left_join(bio, scores, by = "Name")

Joining datasets (using `dplyr`)

library(dplyr)
left_join(bio, scores, by = "Name")

##     Name Class Subject Score
## 1 Andrew     A English    66
## 2   John     B English    76
## 3   John     B    Math    85
## 4   Mary     A English    71
## 5   Jane     B    <NA>    NA

Joining datasets (using `dplyr`)

Question: Who scored the highest in English in each class?

library(dplyr)
bio %>% left_join(scores, by = "Name") %>%
    filter(Subject == "English") %>%
    group_by(Class) %>%
    top_n(1, Score)

## # A tibble: 2 x 4
## # Groups:   Class [2]
##   Name  Class Subject Score
##   <chr> <chr> <chr>   <dbl>
## 1 John  B     English    76
## 2 Mary  A     English    71

Example: Drought levels in CA (1)

drought_data <- read_csv("Drought data.csv")
head(drought_data, n = 3)

## # A tibble: 3 x 2
##   County  Drought_percent
##   <chr>             <dbl>
## 1 alameda             100
## 2 alpine              100
## 3 amador              100

head(CA_data, n = 3)

##        long     lat group order     region subregion
## 1 -121.4785 37.4829   157  6965 california   alameda
## 2 -121.5129 37.4829   157  6966 california   alameda
## 3 -121.8853 37.4829   157  6967 california   alameda

Our drought data and mapping information are in different datasets!

Example: Drought levels in CA (2)

Solution: Join the datasets together.

combined_data <- CA_data %>%
    left_join(drought_data, by = c("subregion" = "County"))
head(combined_data)

##        long      lat group order     region subregion Drought_percent
## 1 -121.4785 37.48290   157  6965 california   alameda             100
## 2 -121.5129 37.48290   157  6966 california   alameda             100
## 3 -121.8853 37.48290   157  6967 california   alameda             100
## 4 -121.8968 37.46571   157  6968 california   alameda             100
## 5 -121.9254 37.45998   157  6969 california   alameda             100
## 6 -121.9483 37.47717   157  6970 california   alameda             100

Example: Drought levels in CA (3)

Map the fill attribute of geom_polygon to the Drought_percent column.

ggplot(data = combined_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, 
            group = group, fill = Drought_percent)) + 
    coord_quickmap()

Example: Drought levels in CA (4)

Use scale_fill_distiller to define a more appropriate color scale.

ggplot(data = combined_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, 
            group = group, fill = Drought_percent)) + 
    scale_fill_distiller(palette = "YlOrRd", direction = 1) +
    coord_quickmap()

Today’s dataset: 2016 US Presidential Elections

Optional material

Other types of joins

Inner join: Matches pairs of observations with equal keys, drops everything else. Hence, only keeps observations which appear in both datasets.

Other types of joins

After matching pairs of observations with equal keys…

Left join: Keeps all observations in the left dataset, drops everything unmatched in the right dataset.
Right join: Keeps all observations in the right dataset, drops everything unmatched in the left dataset.
Full join: Keeps all observations in both datasets.

STATS 32 Session 9: Data joining and maps

Recap of week 4

Agenda for today

Making maps in R: `maps` package

Example: Drawing CA and its counties (1)

Example: Drawing CA and its counties (2)

Example: Drawing CA and its counties (3)

Example: Drought levels in CA (1)

Joining datasets

Joining datasets (using `dplyr`)

Joining datasets (using `dplyr`)

Joining datasets (using `dplyr`)

Example: Drought levels in CA (1)

Example: Drought levels in CA (2)

Example: Drought levels in CA (3)

Example: Drought levels in CA (4)

Today’s dataset: 2016 US Presidential Elections

Other types of joins

Other types of joins

STATS 32 Session 9: Data joining and maps

Recap of week 4

Agenda for today

Making maps in R: maps package

Example: Drawing CA and its counties (1)

Example: Drawing CA and its counties (2)

Example: Drawing CA and its counties (3)

Example: Drought levels in CA (1)

Joining datasets

Joining datasets (using dplyr)

Joining datasets (using dplyr)

Joining datasets (using dplyr)

Example: Drought levels in CA (1)

Example: Drought levels in CA (2)

Example: Drought levels in CA (3)

Example: Drought levels in CA (4)

Today’s dataset: 2016 US Presidential Elections

Other types of joins

Other types of joins

Making maps in R: `maps` package

Joining datasets (using `dplyr`)

Joining datasets (using `dplyr`)

Joining datasets (using `dplyr`)